Intelligent adaptive transport layer  to enhance performance using multiple channels

ABSTRACT

A set of connections is established, continuously evaluated and maintained between two endpoints of a computer network for use in transmitting information flows in a more efficient and controlled manner. New connections are established and existing connections are terminated in a continual search for connections with better and/or different performance characteristics. Each connection may utilize the same or a different path through the network and may have performance characteristics that change over time. Several paths can be used simultaneously for a given information flow to improve network metrics including: throughput, transaction time, data consistency, latency and packet loss. Flows of information can be broken into one or more sub-flows and sub-flows can be assigned to one or more active connections. Furthermore, dynamic decisions regarding how flows are broken up and how they are assigned to connections can be made in response to network conditions. Through the use of these connections, a reduced cost can be offered and application QoS/QoE can be guaranteed, allowing existing networks such as the public Internet to provide an enterprise class connection, which can be used to accelerate enterprise cloud adoption without modifying the present Internet infrastructure.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/626,130, filed on Jun. 18, 2017, now U.S. Pat. No. 10,868,752, issuedon Dec. 15, 2020, which claims priority to U.S. Provisional ApplicationNo. 62/351,953, filed on Jun. 18, 2016, both of which are incorporatedherein by reference.

BACKGROUND

This invention relates to the field of computer networking, and morespecifically to controlling network metrics such as latency, flowcompletion time (FCT) and throughput between endpoints of apacket-switched network. This includes networks such as the publicInternet, private networks, and 3G/4G/5G mobile networks.

The Internet provides excellent connectivity while ensuring propertiessuch as resilience, decentralization, and best effort packet delivery.However, these characteristics result in low utilization of the Internetcore to handle peaks of traffic. In addition, the Internet is notgenerally deterministic, a fact that inhibits its utilization forcritical applications. Enterprises typically deploy private networks toensure metrics of interest including throughput and latency. However,private networks have large operational expenses (OPEX) and capitalexpenditures (CAPEX) and not all enterprises can afford them. VirtualPrivate Networks (VPN) use the Internet as the underlying technology toemulate the benefits of private networks. VPNs rely on tunnelingtechniques to ensure security and performance over a public network.However, VPNs typically use a single tunnel to transmit informationwithout guaranteeing network metrics, a critical aspect for enterprises.Some systems apply a packet switching layer, such as the MultiprotocolLabeled System (MPLS), to prioritize traffic, but this only works withinthe network of a specific carrier. What is needed is an improved methodof controlling metrics such as throughput while guaranteeing Quality ofService (QoS) or Quality of Experience (QoE) when the traffic traversesthe public Internet.

The Internet Protocol suite (TCP/IP) provides an end-to-end frameworkfor communication, specifying how data is sent from one point to theother. This model is commonly presented through the OSI seven-layeredarchitecture or a four layer scheme (link, internet, transport,application). This system has allowed the Internet to scale as thenumber of endpoints has grown rapidly while keeping its cost low. Today,the public Internet is one of the primary systems on which a vast amountof services and applications rely. Many companies use the Internet toprovide services and to manage their infrastructure.

A significant disadvantage of the public Internet is that is notgenerally possible to offer deterministic services guaranteeing networkmetrics such as latency and throughput. This fact has caused severalinstitutions and companies to build private networks to ensure a certainQuality of Service (QoS). Such networks are also connected to the publicInternet but only for minimal purposes or for non-critical servicesthrough firewalls. These systems are expensive deployments since theirdeployment and maintenance is handled by large private companies. Incontrast, the public Internet is a network of networks with a sharedinfrastructure around the world operating under consolidated protocols.

These private networks, whether physical or virtual, rely on differentnetwork protocols and technologies to interconnect their endpoints. Forinstance, Multiprotocol Label Switching (MPLS), described in InternetEngineering Task Force (IETF) Request for Comments (RFC) 3031,incorporated herein by reference, offers controllable performance andreliability compared to standard Internet connections thanks to virtualdedicated communication channels. Among other optimization techniques,it prioritizes traffic based on different parameters such as theapplication type. However, MPLS deployments typically come with a muchhigher cost, in order of 100 times more per Mbit. See for example “Whatis the cost of MPLS?,” Mushroom Networks Blog, Aug. 20, 2015,incorporated herein by reference.

In recent years, the appearance of Cloud Computing has aggravated theinterconnection problem. See for example “A View of Cloud Computing,” M.Armbrust et al, incorporated herein by reference. Cloud Computing offersthe externalization of a flexible infrastructure offered as an on-demandservice, paying for the resources actually used. However, there is acritical point in the Cloud Computing model, the communication betweenenterprises, where the data is generated and consumed, and datacenters,where the data is sometimes processed and stored. This fact hasinhibited the consolidation of the Cloud Computing paradigm. Accordingto some industry leaders only five percent of workloads are in thepublic cloud. The main reasons are security, lock-in cost, data privacy,and network costs, which are aggravated by the reluctance from IT teams.

There are missing pieces needed to achieve the potential of CloudComputing such as the use of a network capable of managing criticalapplications and the need to ensure certain boundaries for non-criticalapplications. Currently, only large companies can pay for the dedicatedlinks needed for private Wide Area Networks (WANs). Furthermore, thesenetworks present scalability problems since they rely on point-to-pointconnections instead of a packet switching network, giving up the mainbenefits of a network such as the public Internet. Lately, the largestCloud Service Providers have tackled this problem by offering solutionsto connect private datacenters to their public clouds. For example,Amazon Web Services offers Direct Connect and Microsoft Azure hasExpressRoute. However, these solutions address the problem by usingprivate connections to their Cloud. Thus, the problem of a networkconnection with low cost and excellent scalability while maintaining thereliability remains unsolved.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example of an Enterprise scenario.

FIG. 2 illustrates an example of an Enterprise scenario.

FIG. 3 illustrates an example of a Cloud scenario.

FIG. 4 illustrates an example of a Cloud scenario.

FIG. 5 illustrates an example of a Global scenario.

FIG. 6A illustrates a simple MPTCP example.

FIG. 6B illustrates a comparison of standard TCP and MPTCP protocols.

FIG. 7 illustrates an exemplary embodiment of a branch office connectedto enterprise headquarters.

FIG. 8 illustrates a division of a flow inside multiple CSPs.

FIG. 9A illustrates CSPs crossing different ISPs and the core of theInternet.

FIG. 9B illustrates different paths over the Internet.

FIG. 10 illustrates the CSP state diagram.

FIG. 11 illustrates the three-layer CSP management architecture.

FIG. 12 illustrates the system architecture from a single enterpriseperspective.

FIG. 13 illustrates the system architecture from a global perspectivewith two different enterprises.

FIG. 14A illustrates the system architecture from a global perspectivewith two different enterprises including an inter-enterprise mid-layer.

FIG. 14B illustrates a more complex mid layer having a specificsub-layer for each enterprise plus a combined sub-layer.

FIG. 15 illustrates learning components inside the architectural layers.

FIG. 16 illustrates a hierarchical learning architecture includinginputs and outputs for each level.

DETAILED DESCRIPTION

Embodiments of the present invention make intelligent use of the publicInternet to guarantee application Quality of Service (QoS) or Quality ofExperience (QoE) using its current infrastructure. While Quality ofService (QoS) objectively measures service parameters (such as packetloss rates or average throughput), Quality of Experience (QoE) is adifferent but related concept that measures a customer's experienceswith a service (e.g., web browsing, phone call, TV broadcast, or call toa Call Center). The key premise is to maintain the reduced priced andits scalability while simultaneously providing guaranteed QoS/QoE.

Using embodiments of the present invention a set of connections isestablished between multiple endpoints to transmit information in a moreefficient and controlled way. Through these connections a reduced costcan be offered and application QoS/QoE can be guaranteed for networkmetrics such as throughput, packet loss and latency. Employingembodiments of the present invention allow the existing networks such asthe public Internet to provide an enterprise class connection, which canbe used to accelerate enterprise cloud adoption without modifying thepresent Internet infrastructure.

Embodiments of the present invention are based on the observation thatthere is unused capacity in most of the networks, especially in theInternet. In order to absorb traffic peaks, routers in the core are overprovisioned, and therefore run at a low utilization level. This meansthat except in periods of congestion, there is plenty of unusedbandwidth. To use this extra bandwidth, tools and techniques utilizedcould use multi-path protocols. Several paths are used simultaneously toimprove different network metrics including: throughput, transactiontime, data consistency, latency, and packet loss. Furthermore, dynamicdecisions are made in response to network conditions.

The following acronyms are used herein:

CSP Certified Starflow™ Path

QoE Quality of Experience

ASP Aggregated Secured Paths

BW Bandwidth

QA Quality Assurance

LCI Local Contextual Information

GCI Global Contextual Information

In this specification the term Internet is used to mean the publicInternet, or any other packet switched network utilizing InternetProtocols, whether public or private. Embodiments of the presentinvention can be employed under a variety of different scenarios.Described below are some of the scenarios of interest.

Scenario 1: Enterprise: Office Interconnection

In this scenario, depicted in FIG. 1, Branch Offices (100, 110, 120,130) establish connections to their Headquarters HQ (140). MultipleRouters (e.g. 160) on the public Internet (150) are used to establishconnections. Currently used private links are no longer required becauseembodiments of the present invention offer QoE guarantees. Rather thanrelying on private links, intelligent policies take advantage of theexisting overcapacity of the Internet, 3G/4G/5G mobile networks and/orprivate networks using multipath techniques. This solution results in avirtual private WAN where companies experience high performance whilereducing the OPEX and CAPEX.

In FIG. 2, the Enterprise scenario is illustrated with mobile devices.Branch Offices (200, 210) as well as Mobile Devices (220, 230, 240) arecoupled to the public Internet (250) through Routers (260) and toHeadquarters (HQ) (270). Mobile devices add a new dimension to theoffice interconnection scenario. In this case, devices are not confinedwithin the boundaries of physical locations but have different mobilitypatterns. For example, a mobile device establishes a connection with HQwhile staying in a specific branch office. Subsequently, that employeemay visit a customer in another location taking that device with them.The present invention handles those situations without cutting theconnection or degrading the QoE. Furthermore, additional constraints mayapply based on enterprise premises. A company policy may specify thatall devices need to be connected first to an office to use a firewall.Another company may allow mobile devices to connect directly to HQs.Although these are high-level situations, it is clear that the number ofmobile devices and their mobility patterns will impact the solutionpresented by the present invention.

Scenario 2: Cloud: Connecting Cloud Datacenters with Clients

FIG. 3 depicts a scenario involving the Cloud. Rather than connectingbranch offices to the headquarters, multiple clients (300, 310, 320,330) are coupled to a public Cloud datacenter (340) through the publicInternet (350) through Routers (360). There are two main schools ofthought in the Cloud space. Some leading companies (e.g., Amazon andGoogle) focus exclusively on the public Cloud, which runs all theprocessing, and stores the data. In contrast, other companies (e.g.,Microsoft) advocate a hybrid Cloud model, which allows for partitioningthe workload and storage between the public Cloud and private enterpriseclouds.

Embodiments of the present invention uses the same underlying technologyin the Cloud scenario as in the Enterprise scenario. Each endpoint(client or cloud) uses multipath technology to simultaneously enablemultiple paths, thus ensuring the required access QoE.

Mobile devices can also be considered as Cloud clients. FIG. 4illustrates a Cloud scenario incorporating mobile devices. Clients (400,410, 420) and Mobile Clients (430, 440) are coupled together using thepublic Internet (450) through Routers (460) and to the Cloud Datacenter(470). Key differentiations between the Cloud and Enterprise scenariosinclude the wide variety of application types, traffic patterns,resource sharing and the administration of those resources among others.

Scenario 3: Global Scenario

FIG. 5 illustrates a Global or All-to-All scenario. This scenarioresults from the combination of the Enterprise and the Cloud scenarios,including mobile devices. Clients (500, 520, 530, 540) and MobileClients (510, 550), along with Enterprise Headquarters (570) and CloudDatacenter (580) are coupled to the public Internet (590) throughRouters 560). In this scenario, embodiments of the present inventionsupport different connections between endpoints (e.g., offices, HQ,mobile devices, Public Clouds, etc.) while guaranteeing different QoEsbased on the needs of the clients.

INTRODUCTION

Internet Transport Protocols were originally designed with resilience,robustness, and stability in mind. In addition, they operate on aper-link basis since the visibility of each router is limited to itsneighbors. Thus, routing decisions are based on local conditions ratherthan taking into account the status of the entire network. Thisarchitectural decision greatly contributes to the Internet scalabilitybut penalizes its performance for other metrics such as throughput andlatency.

Models such as the Cloud rely on the public Internet to provide theirservices, so network metrics have a large impact on overall performance.In addition, new problems arise such as lack of control over the data.Despite these influences, companies are motivated to use public Cloudsto exploit factors such as their ease of installation, flexibleinstances, better cost effectiveness, and availability.

Embodiments of the present invention employ a network solution thatprovides superior performance using existing public Internetinfrastructure. To this end, optimized routing techniques are usedexploiting local and global context information over differentconnection types to ensure a Quality of Experience (QoE) to the finalusers.

The key technique employed is the utilization of multiple paths totransport information packets in an intelligent way. The capabilities ofthe aggregated paths include increased throughput, enhanced security,reduced latency, reduced packet losses and greater reliability.

Protocols

A desirable solution for companies facing the scenarios discussed aboveis to obtain the benefits of a private network but using the publicInternet as the underlying infrastructure. This approach would bring areduction in deployment and management costs. The technique that enablesthis accomplishment relies in part on the implementation of tunnels.They add robustness, integrity and security to the transported trafficover the public Internet.

Embodiments of the present invention use Virtual Private Network (VPN)to implement tunnels. VPNs are discussed in Internet Engineering TaskForce (IETF) Request for Comments (RFC) 2764, incorporated herein byreference. In one embodiment of the present invention, OpenVPN(https://openvpn.net/) is used. In an alternative embodiment, IPSECimplementation is used. IPSEC is described in Internet Engineering TaskForce (IETF) Request for Comments (RFC) 6071, incorporated herein byreference.

Embodiments of the present invention use VPN tunnels to transport datain a confidential manner using encryption. In an alternative embodiment,the tunnels are not secure to reduce the computation time required togenerate encrypted packets and increase the performance.

Embodiments of the present invention exploit the underlying publicInternet infrastructure to its favor. The Internet was designed andworks as a packet switching network and in accordance there ispotentially a large number of paths between a source and a destination.While embodiments of the present invention utilize tunneling to discoverand maintain multiple paths, alternative embodiments utilize othertechnologies such as connection pooling. Connection pooling consists ofmaintaining a pool of connections always active, and reusing them totransport application data. The difference between connection poolingand tunneling is that tunnels transport unmodified packets with theirheaders, while connection pooling transports the payload of the packets,and additional details may need to be sent out-of-band or through acustom protocol. In one embodiment of the present invention, TCP poolingis used to transport TCP data over the Internet.

Multipath Transport allows using a set of different paths withoutrequiring any modification or reconfiguration of the network equipment.That is, Multipath Transport is transparent to the routers. Embodimentsof the present invention combine the advantages of Multipath Transportwith VPNs to simultaneously transport data over a pre-established andcharacterized set of tunnels. These VPN tunnels that can carry packetsof any protocol that fits the characteristics of the OSI data link layeror above, such as Ethernet, ARP, ICMP, IP, TCP, or UDP, among others.

Many techniques can be defined that can exploit multiple paths, forexample replication and dispersion. Replication duplicates theinformation flows and sends them over through different paths to obtainthe best performance. The increased reliability comes at a cost. Thistechnique has a large overhead since the same traffic sent multipletimes interferes with each other and congests the routers. In contrast,dispersion breaks an information flow into sub-flows and sends them overdifferent paths. This technique reduces the congestion but canpotentially aggravate the out of order problem. If the paths are usedintelligently according to specific requirements of the transportedtraffic, performance can be increased. Both techniques share a commonoverhead to create the multiple paths.

Embodiments of the current invention in addition to exploiting themulti-connectivity of the Internet, also tweak VPNs and TCP protocolparameters to improve its throughput and performance in general. Forexample an embodiment could take advantage of the abstraction providedby the VPN tunnels to simulate a LAN using huge Maximum TransmissionUnit (MTU) values (even bigger than Ethernet 9 KB Jumbo frames, e.g. 48KB MTU); the TCP Maximum Segment Size (MSS) will then adapt to thistweaking allowing for the injection of big frames in the tunnel virtualinterface; these huge frames will then be fragmented in multiple IPfragments by routers on the path or by the same physical/virtual hostgenerating them. In this way the TCP congestion control algorithm on thesender will show a faster growth of its sending window value resultingin consistently augmented throughput.

To exploit these advantages, different protocols can be used althoughTCP and UDP are its main exponents. Embodiments of the current inventiontransport packets through the tunnels. Tunnel selection and packetscheduling are based on tunnel metrics that match packet prioritizationrequirements.

MPTCP

One approach to Multipath Transport, Multipath TCP, is described inArchitectural Guidelines for Multipath TCP Development, described inInternet Engineering Task Force (IETF) Request for Comments (RFC) 6182,incorporated herein by reference. Multipath TCP (MPTCP) is an extensionof TCP to enable multiple paths. FIG. 6A illustrates a simple example ofMPTCP. Two connections (A1, A2) couple Host A (600) to the Internet(610) and two connections (B1, B2) couple Host B (620) to the Internet(610). Each path from Host A to Host B is uniquely identified by the5-tuple that consists of: IP source and destination addresses, IP sourceand destination port numbers, and the protocol used.

As noted above, the paths through the Internet are not necessarilydisjoint. For example, A1-B1 and A1-B2 could share a common link withinthe network, in which case an intelligent selection will preventcongestion and cross-interference.

MPTCP has two major goals: (i) improve throughput through the concurrentuse of multiple paths and (ii) improve resilience because segments canbe sent over any path. The two objectives are not independent. Nodefailure is an extreme case in which the resiliency of MPTCP becomesimportant. In some conditions, MPTCP can outperform TCP. In practice,implementation details must be considered. For instance, the overheadcost of MPTCP may counteract its advantages in the transfer of smallfiles.

When the use of MPTCP is generalized, it may also bring a reduction inthe congestion of the overall Internet by shifting away traffic fromcongested bottlenecks through a better usage of the spare capacity.

An overview of the MPTCP architectural stack is depicted in FIG. 6B.Standard TCP (630) is illustrated on the left and MPTCP (640) isillustrated on the right. The MPTCP layer handles path management,packet scheduling, the m interface, and congestion control. Thesub-flows are standard TCP sessions to give each path the underlyingtransport. All these details are transparent to the applications.

The reference MPTCP implementation for Linux executes in kernel space.Embodiments of the present invention involve modifications to certainMPTCP kernel modules of the reference Linux implementation. Thesemodules receive commands through Netlink to optimize the MPTCPconnections from a process implemented in user space.

In an embodiment, a new kernel module is utilized that improvespoint-to-point connections between clients (e.g. branch office) andservers (e.g. HQ) instead of using the default full mesh topology. Thismodule reduces the overhead of Multipath Transport through theelimination of unused connections (e.g., between two IP ports of thesame device in the same office). In an embodiment, the interface limitof the reference MPTCP Linux implementation has been increased. Withthis modification, a flow can now use up to up to 32 interfaces, insteadof eight, the limit imposed by the original implementation. Inalternative embodiments, the maximum number of interfaces and subflowscan be greater.

In an alternative embodiment, a full user-space implementation, togetherwith a user-space packet I/O (e.g., DPDK) can be utilized. Thisembodiment is more efficient than the hybrid kernel-user space solution.It enables control of the protocol implementation and the optimizationof performance while reducing the overhead associated with communicationfrom kernel to user space and vice versa. In another alternativeembodiment, a full kernel-space implementation could be utilized.

In embodiments of the present invention, new versions of two MPTCPsubmodules are utilized: the Path Manager and the Packet Scheduler. Thepath manager decides how many sub-flows to establish. The new versionsare able to create and close sub-flows dynamically as well as othermodifications described below. The scheduler assigns packets tosub-flows. The new scheduler algorithm balances the load betweensub-flows and applications.

Multipath UDP (MPUDP)

UDP is the other major transport protocol. UDP extensions to supportMPUDP have not been standardized yet, but the underlying idea is thesame: to enable UDP connections to exploit multiple pathssimultaneously. Unlike TCP, UDP is not a reliable transport protocol,i.e., it leaves to the application the issues of dropped, out of orderand duplicated packets.

MPUDP inherits UDP attributes: unreliable transport, lack of congestioncontrol and packet ordering guarantees. Currently, only three standardtransport layer protocols implement unreliable transport: UDP, DCCP andSCTP. Since UDP does not have congestion control, it is used mainly ontransfers with low throughput requirements. Depending on the applicationrequirements, different key metrics need to be optimized. For example,in some applications reducing packet loss is very important while inothers reducing the one way delay is more important. Multipath cansignificantly help in this topic. For example, in order to reduce packetloss, the UDP traffic can be migrated to a less congested path. Toreduce one way delay, UDP traffic can be replicated and simultaneouslysent over several paths. In the latter case, it is important toeliminate replications on the receiver side. This could be accomplishedby changing the UDP transfer to a DCCP transfer.

Datagram Congestion Control Protocol (DCCP) implements unreliabletransport with TCP-like congestion control (using session hand-shakingand sequence numbers). DCCP is described in Internet Engineering TaskForce (IETF) Request for Comments (RFC) 4340, incorporated herein byreference.

Stream Control Transmission Protocol (SCTP) has an extension for PartialReliability (PR), which prevents retransmission of expired data. SCTP isdescribed in Internet Engineering Task Force (IETF) Request for Comments(RFC) 4960, incorporated herein by reference.

Among these three protocols, only DCCP and SCTP have congestion control,and only SCTP has support for multi-streams and multi-homing(multi-path). Thus, two possible candidates for efficient MPUDP would bemulti-path DCCP (manually implementing the path manager and the packetscheduler) or SCTP with Partial Reliability and multi-homing. Anapproach such as the former is presented in “Packet Scheduling andCongestion Control Schemes for Multipath Datagram Congestion ControlProtocol”, by C. Huang, Y. Chen and S. Lin, incorporated herein byreference. An approach such as the latter is presented in “PartiallyReliable-Concurrent Multipath Transfer (PR-CMT) for Multihomed Networks”by C. Huang and M. Lin, incorporated herein by reference.

In addition, some UDP-based applications are sensitive to packet order,which can be aggravated by multi-path delivery. Both DCCP and SCTPsupport packet reordering to avoid this problem.

Metrics

Tunnels utilized in embodiments of the present invention are evaluatedin terms of network metrics. Some of them are:

Bandwidth (BW): Bandwidth is defined as the theoretical maximum amountof data that can be transmitted in a fixed amount of time. Thus,bandwidth represents the capacity of a network connection for supportingdata transfers. Bandwidth is often expressed in bits per second (bps,Kbps, Mbps, Gbps).

Packet Loss: In a packet-switched system, packet loss refers to thenumber of packets that fail to arrive at their intended destination. Themain factors that cause packet loss are link congestion, deviceperformance (router, switch, etc.) such as buffer overloads, softwareissues on network devices, and faulty hardware. Dropping is thedeliberate discard of a packet.

Reliability: Reliability describes the ability of a system or componentto function under stated conditions for a specified period of time.

Throughput: Throughput is how much data actually travels through the‘channel’ successfully. This can be limited by different thingsincluding latency, packet loss, and what protocol is being used.Throughput is usually measured in bits per second (bps, Kbps, Mbps,Gbps).

Latency: Latency is defined as the time since an application generatedsome data to transmit until such data arrives at the destinationapplication to be processed. Latency in packet switched networks can beaffected by many different factors, especially in the operatingenvironment of long distance networks, such as processing delay, bufferbloat and queueing delays.

Jitter: Jitter is the absolute value of the difference between theforwarding delay of two consecutive received packets belonging to thesame stream. Jitter results from network congestion, timing drift androute changes. As reported in Internet Engineering Task Force (IETF)Request for Comments (RFC) 3393, incorporated herein by reference, theterm jitter in packet-switched networks to identify the variation inpacket delay is not completely correct. Packet Delay Variation (PDV) maybe a better term to use in this context.

Round Trip Time (RTT): Round-trip time, also called round-trip delay, isthe time required for a packet to travel from a specific source to aspecific destination and for a return packet to travel back to thesource.

Inter Packet Time (IPT): Inter Packet time is the time elapsed betweentwo consecutive packets within a flow. When Inter Packet Arrival Times(IPAT) are compared with Inter Packet Emission Times (IPET), it providesa convenient and efficient way to compute Jitter. Through heuristics itis possible to predict Jitter by only assessing IPAT.

Flow Completion Time (FCT): Flow Completion Time is the time required toperform a successful transaction using a network flow. The type oftransaction and its correctness is application-dependent.

Observable Connection Path: The Observable Connection Path is themeasurable set of nodes traversed by the constituent packets of a flow(e.g. a persistent tunnel connection between two endpoints).

Certified Starflow™ Paths

An important concept for embodiments of the present invention is the“Certified Starflow™ Path”, or CSP. CSPs are persistent connectionsopened between two endpoints of the network. In embodiments of theinvention this persistent connection is implemented by means of a VPNtunnel. Once the connection is opened and kept alive, metrics, such asthe ones described above, start being monitored. Once a tunnel passescertain thresholds during a desired interval, the tunnel is promoted tothe status of a “Certified Starflow™ Path”, or CSP. Similarly, a CSP maybe demoted if it fails to maintain certain thresholds for a desiredinterval. Different algorithms govern the promotion and demotion processaccording to the desired QoEs required as well as other factors. Forexample, if a company has a QoE that requires a throughput higher than10 Mbps, only the VPN tunnels that get over this threshold will getpromoted to CSPs. In alternative embodiments, CSPs do not use securityor encryption. They may be implemented with any tunneling technique.

A path in a packet switched network such as the Internet can be definedas the set of hops traversed by the packets exchanged between twoendpoints. This set depends on the forwarding decisions taken by eachhop. Routers in the Internet tend to forward packets belonging to thesame flow by uniquely identifying them (e.g., the five tuple as a flowidentifier) and maintaining their corresponding routing state whileactive; this is called flow stickiness.

Embodiments of the invention exploit the dynamic multiple paths offeredby the Internet. In an embodiment of the present invention multiple ISPscould be used to connect two endpoints, implicitly providing multiplepaths between them as they will be crossing distinct administrativedomains. Even with a single ISP per endpoint, traffic flowing betweenthese two endpoints can traverse distinct paths. This is due to somerouters on the path having multiple outgoing routes towards the samedestination. This multiplicity of next hops can be exploited by routersto load balance traffic thus improving performance and reducingcongestion. To avoid breaking connection oriented protocols (such asTCP) in a multipath routing situation, routers also tend to send packetsbelonging to the same flow towards the same next hop, enforcing flowstickiness. Embodiments of the present invention proactively maintainconnections between endpoints, regardless of the agents that establishedthese connections, in a process defined as “Path Fishing”.

Discovering CSPs with distinct routes between the same two endpointsdepends on how routers enforce per-flow load balancing. To distinguishpackets belonging to a specific flow and route them towards the samenext hop they usually compute a hashing value over some per-flowinvariant fields. An example is the 5-tuple of a UDP/TCP flow packet: IPsource, IP destination, port source, port destination and transportprotocol identifier. In an embodiment of this invention UDP tunnels areused. Other embodiments could use IPsec or other tunneling techniques.In some cases no source or destination ports are present (e.g. in anIPsec packet working in tunneling mode with an ESP header, which is themost common configuration for IPsec in tunnel mode). In this case otherbytes of the header could be used as hashing values together with sourceIP and destination IP by the routers on the path to maintain a stablepath for all the packets belonging to the same IPsec tunnel.

A discovered CSP is characterized based on a set of relevant metrics,the path being one of them. The characterization can be either active orpassive. The former utilizes active probes to retrieve the measurements,while the latter gathers the measurements from the actual traffic beingtransported. Even if multiple CSPs with equivalent paths are discovered,they can have distinct behavior in term of other metrics. The number ofCPSs between two sites is dynamic and intelligently managed inconcordance with the QoE required.

In one embodiment of the invention, the number of CSPs is limited inpart by the number of unique tunnel identifiers. For example, if a5-tuple technique is used, the number of CSPs from a given source to agiven destination is determined by the number of unique 5-tuples thatcan be established between endpoints.

FIG. 7 illustrates an exemplary embodiment of a branch office connectedto enterprise headquarters. In this example, the branch office has twodifferent ISP connections, one ISP (700) with a single IP address withtwo transport ports available and the other ISP (710) with two IPaddresses with three and two transport ports respectively. On the otherside of the connection, the headquarters has a single ISP (720) with oneIP address and two transport ports.

On top of this network setup, all the possible CSPs (730) between thelocations are also shown. Note that all the CSPs within the branchoffice are not displayed. Although this is a simple network topology,enterprises may require more complex topologies to interconnect theiroffices. In consequence, topologies will determine the number of CSPs.

In this example, the IP transport port limitation allows for 14 possibleCSPs, which can be expressed as the sum of the total number of IPtransport port pairs available at the branch office and the total numberof IP transport port pairs available at the headquarters. Thus,embodiments of the present invention can potentially transfer trafficover those 14 CSPs.

While network configurations limit the number of CSPs, the number offlows under transmission depends on users and their applications. Also,the number of flows is limited by the capacity of the CSPs, which is thebandwidth of each connection. Furthermore, flows are divided intosub-flows to enhance the granularity of the system while distributingtraffic over available CSPs.

In one embodiment of the present invention, a flow can use up to 32different CSPs concurrently, where each CSP can handle a subset of thetotal number of sub-flows of that flow. In alternative embodiments, thenumber of different CPSs that can be used can be greater than 32. Thisis illustrated in FIG. 8. A single flow (800) is divided into up to 32sub-flows (810) and transmitted over up to six CSPs (820).

In one embodiment of the present invention, the original MPTCP protocolis utilized and thus it establishes a full mesh connection between allendpoints. In an alternative embodiment, a point-to-point topology isused to eliminate the overhead of unused tunnels, to increase the numberof possible sub-flows, and to overcome limitations on flow to CSPassignments.

CSP Directionality

In one embodiment of the present invention, a CSP is bidirectional as aninherited characteristic from tunnels. Whether both CSP directions usethe same network path or not depends on the network infrastructure. Forexample, for IP-based tunnels both directions do not necessarilytraverse the same network path. This results in each direction of atunnel potentially having different characteristics (e.g., in terms oflatency, or bandwidth).

However, some applications and/or protocols may suffer from thisasymmetry in the characteristics of both flow directions. For example,TCP connections largely depend on RTT (the combined latency of bothdirections) to ensure a global end-to-end QoE, due to the protocol'sdependency on the returning acknowledgement packets (ACK) beforeforwarding new data.

In one embodiment of the present invention, each data packet can betransported through the CSP that better fits the characteristics of theapplication or the policies of the system at that moment. For example, aTCP connection may be initially established through a given CSP, butdata packets can subsequently sent through alternative CSPs that reachthe same destination with better characteristics such as higherbandwidth. Moreover, the corresponding ACK packets may return throughany of these CSPs, not necessarily the same as the data packet, based onother characteristics such as latency. These decisions are dynamic andmay change over time and for each packet.

In an embodiment of the present invention CSPs will be classified bymeans of the metrics described above. In addition, flows will have a setof requirement in terms of those same metrics plus specificcharacteristics defined by the application opening those flows (e.g.,type of traffic, file transfer size, etc.). The system is able to selectthe CSPs that best optimize the application needs to forward the flow'spackets (e.g., low latency CSPs for latency-sensitive applications suchas VoIP or CSPs with high available bandwidth for a bulk data transfer).

To classify CSPs, there is a continuous monitoring of the tunnels whichdynamically reacts when a tunnel violates some criteria. For example,CSPs may be classified according to their throughput. In a complementaryclassification, CSPs may be classified according to their packet lossmetrics. For example, category A includes the CSPs with a throughputhigher than a specific threshold while category B has the CSPs withthroughput below that threshold but with low packet loss.

Additionally, classification patterns can associate CSPs with certaintypes of traffic. For example, if large files are being sent, thedesired CSPs should have an excellent throughput. Alternatively, if datareception acknowledgments are being sent, a reduced latency ispreferable over throughput. Furthermore, flows can have differentpriorities according to user and application types. For example,classification can contribute to guarantee those priorities withoutcompromising characteristics such as real-time or even QoE.

Deployment

Embodiments of the present invention are deployed as “agents” that runinside a Virtual Machine (VM) that can be deployed in any device ornetwork equipment. Where the VM is executed determines the source anddestination of the CSPs. There are three VM deployment options, whichhave an impact on system performance: 1. execute VMs at networkequipment that connects LANs with the WAN; 2. execute VMs in certainaggregation points inside LANs; and 3. execute VMs in each device insideLANs.

In an alternative embodiment, agents are deployed using containersinstead of VMs. Containers are light virtualization mechanisms providedby the operation system, allowing to isolate groups of processes orgroups of system resources. For example, in Linux, applications andnetwork devices can be isolated in namespaces.

In some cases CSPs may traverse different networks domains (e.g.,distinct Autonomous Systems, AS), including routers in the Internetcore. In addition, if the third option above is chosen, CSPs will alsocross the LANs. FIG. 9A illustrates this situation. One end point (900)having two ISP connections (910) is coupled to another endpoint (940)also having two ISP connections (930) through the Internet core (920).Each CSP starts at one IP/transport-port pair in one of the endpointsand terminates at an IP/transport-port pair on the other.

Where the agents are placed contributes to the total number of CSPs.This decision determines the number of endpoints to be connected. Forexample, if the first option above is used, deploying the VMs at ingressand egress of the LANs, the number of CSPs may be reduced due to thelimited number of CSP identifiers available at the ingress and egresspoints. This restriction stems from the number of available transportports publicly open by the endpoint, as there are security concernsregarding opened ports. In contrast, if VMs are deployed at all devicesinside LANs the number of CSPs can be larger since more CSP identifierswould typically be available. This choice also has implications in termsof computational capacity, security, and company policies. Running a VMin each device may not be desirable by some companies because theyprefer to use a firewall to secure their LANs.

While running VMs on all devices results in an end-to-end QoE, runningthem in egress/ingress points of LANs may result in performanceuncertainties because QoE is guaranteed only between those two points.Furthermore, CSPs inside the company likely share most of their linkssince the internal routing options are typically limited. This fact canimpact performance due to cross-interference and congestion.

The third option discussed above is an intermediate solution between theprevious two. Instead of running VMs in all LAN devices, only certaincritical aggregation points will execute the VMs. This technique is notas aggressive as the second option and can certify QoEs even within theLAN.

CSP Assignment

The number of paths connecting two endpoints through the public Internetcan be very large. The agents are responsible for deciding the properset of tunnels to be promoted and utilized as CSPs. The selection of CSPto schedule and transport packets is dynamic. This decision process isbased on a twofold criteria, the traffic requirements and the CSPscharacteristics. The former consists of classifying the type of trafficbeing transported, while the latter is the result of the proactivemonitoring of CSPs. This allows to dynamically adapt the traffic andreact to changes in the network. This allows agents to use the CSPs withthe better metrics aligned with the traffic requirements, observablepath being one of them; for example different CSPs could traversedistinct network paths (both entirely observable) and the system coulddecide to replicate packets on these distinct CSPs to enhancereliability.

Independently of where the VMs are run, FIG. 9B illustrates therelationship between a set of tunnels and a pool of CSPs. End point A(950) and end point B (970) communicate over the Internet through a setof intermediate routers (960). In this scenario there are four uniquepaths, shown in the first four rows of the table below.

In addition, different options to run the VMs are illustrated. In oneembodiment the VMs are run on endpoint equipment (980), while in anotherembodiment the VMs are executed only at the exit router of the company(950).

The table below shows six possible paths and the nodes traversed byeach:

Nodes crossed Path 1 0 1 2 3 5 6 7 Path 2 0 1 2 3 6 7 Path 3 0 1 2 4 6 7Path 4 0 1 2 z 4 6 7 Path 5 0 1 2 3 6 7 Path 6 0 1 2 3 5 6 7

Note that it is possible for two different CSPs to traverse the sameroute, so it would be possible for an additional tunnel to cross thesame nodes as one of the other tunnels. For example, Tunnel 5 traversesthe same nodes as Tunnel 2 and Tunnel 6 traverses the same nodes asTunnel 1. Even when two different CSPs traverse the same nodes, they mayhave different performance characteristics.

In this example, agents might have one active CSP (Tunnel 1), a standbyCSP (Tunnel 2), and a probing CSP (Tunnel 3). One possible set ofconsidered CSPs states and the state transitions are described in detailbelow. The active management and monitoring of CSPs evaluate each oneagainst certain thresholds and take action accordingly (e.g., remove anunderperforming CSP). If the probing CSP does not outperform the activeand/or the standby CSP, it can be discarded and that CSP identification(5-tuple) will be reused for another tunnel. Then, this new tunnelbecomes the probing CSP to analyze if its performance improves the othertwo being used. CSP states and the transitions between them arediscussed in more detail below.

In one embodiment of the present invention, a pool of CSPs is used whereone or more of them are probing CSPs. Therein, performance is evaluatedperiodically, for example every five minutes. In alternative embodimentsthe time between measurements could be based on other logic or behaviorpattern. To reduce the measurement overhead, passive measurements can beused if real traffic is being transmitted, or active measurements can beused if there is no traffic on those CSPs.

In some embodiments, multiple consecutive tests of a probing CSP/tunnelthat does not outperform any of the other CSPs cause the system todiscard the CSP and select a better one in replacement. The testcriteria for discarding a CSPs may consist in consecutive testdistributed in time. In case there are active flows on the CSP, itremains active until the flow is finished but no new flows are assignedto it. This causes a relaunch of the CSP/tunnel with a new 5-tuple,which could be the same as the previous 5-tuple. The new CSP may differin its source transport port, which now is randomly chosen from the setof available ports at the source. This port modification is performed tominimize the possibility of reusing the same discarded path. Despitethis change, it is not guaranteed that the “new” tunnel has differentproperties than the discarded or active tunnels. For this reason, theagents check whether it has different properties or not.

In alternative embodiments, multiple CSPs are maintained that use thesame underlying path. This is due to the fact that different performancecan occur for two CSPs even if the network path is the same. Thesedifferences can arise from the routers and their queuing, routing and/orinternal load balancing strategies.

In some embodiments, agents send keep-alives through an option of theVPN software to maintain the CSP open. This technique generates anetwork message periodically, for example an ICMP “ping” packet everyten seconds, in case of inactivity, to maintain the tunnel open.

In alternative embodiments, policies are used to determine when to probenew tunnels and guide the promotion/demotion process with dynamicalgorithms. They also affect sampling frequencies to test CSPs and whichgranularity is required to properly adapt to network conditions.

Route Identification

One way to differentiate CSPs is through the path they traverse. In casethis information is available, it satisfies another major concern in thepublic Internet, to gain visibility. Users would like to gain control,or at least knowledge, on which nodes (e.g., set of IP addresses) theirtraffic traverses. Embodiments of the present invention utilize twodifferent techniques to provide this visibility.

The first method is the “traceroute” technique. This technique sends asequence of packets with an incremental Time To Live (TTL). When eachpacket reaches its maximum number of hops in a certain router, thoserouters return the packet to its source indicating which node they havereached.

For example, the first packet sent has a TTL=1. Once the first routerreceives this packet, it decrements the TTL by 1. Since the resultantvalue is zero, this router returns the packet indicating its own addressor identification.

The traceroute method is not entirely accurate. First, routers do nothave the obligation to answer, and in case they do they may not send thecorrect information about themselves. They may send a generic answerindicating their ISP or even provide a false IP address. While this canbe done on purpose, this is not generally the case. Traceroute anomaliesare analyzed and explained in “Avoiding traceroute anomalies with Paristraceroute” by B. Augustin et al, incorporated herein by reference.Traceroute anomalies are generally related to the topology itself or therouters load balancing policies.

Also, the traceroute technique makes an assumption that is not alwayssatisfied in packet-switching networks. One packet can use a certainpath but the next one may use a different one. In this case, incrementalTTL packets do not provide the router identification within the samepath. This fact could lead to potential incorrect link identifications.

In one embodiment an alternative method is used that modifies thetraceroute technique in which probes with altered TTL values masqueradeas actual traffic of the connection flow we want to trace. Packets areforged to have a 5-tuple equal to the one of the connection flow packetsplus a random payload which is fingerprinted to match the routersreplies with specific probes. This modified traceroute technique is morereliable than traditional traceroute techniques as it simulates actualtraffic flows without using well known or random ports for the probes.

Together with the flow stickiness effect (generally guaranteed even whencrossing load balancers), repeatable results in terms of route discoverycan be achieved including the detection of route changes over time.However, this technique has drawbacks that affect all traceroutingtechniques: (i) NAT addresses rewriting in private networks, and (ii)ISP non-responding when a TTL expires.

A specific drawback peculiar to the modified traceroute technique is thepossibility of not receiving any answer from the last hop as the 5-tupleis an actual valid tuple that is forwarded to a listening service onthat port (e.g., the OpenVPN service). The tunnel service is then smartenough to drop the packet as it does not follow the internal protocolrequirements; yet the ICMP time exceeded message for that specific probeis not generated. For this purpose, the tool keeps sending probes afternot receiving any answer incrementing the TTL value for some time beforequitting and concluding that the end of the connection has been reached.Alternative techniques for route discovery are possible and inalternative embodiments, other route discovery mechanisms are used.

Once a technique that can provide information about the network nodestraversed along a tunnel is established, the information including theset of nodes can be maintained, and other metrics of interest for eachpath can be collected. Thus, instead of managing large CSP pools, areduced set of CSPs can be used. This optimization results in areduction of the overhead due to a CSPs management simplification.

CSP States

Network fluctuations affect the performance of CSPs, which prevents astatic configuration when agents are initialized. Rather, CSPs need toadapt to continuously changing Internet conditions. To address thisissue each CSP has a set of possible states that reflects differentsituations based on CSP roles and their metrics (e.g., throughput,latency, etc.).

While to the final user the system is stable, the set of CSPs used isdynamically changed to maintain the desired QoE. FIG. 10 illustrates anexample of possible states of a CSP. One embodiment uses five states:active (131), standby (111), waiting (141), demoted (121), and probing(101). Other embodiments can modify these states, for instance removingthe waiting state. The state diagram shown in FIG. 10 represents thetransitions between those states. These states are described in detailbelow.

Active: This is a CSP that can send or it is currently sending traffic.

Standby: This CSP state implies that it is qualified to be used ifpromoted to active. This promotion is based on the CSP metrics(bandwidth, latency, etc.). Once promoted, the standby CSP becomesActive. A Standby CSP can also be eliminated if its quality violates aspecific criterion.

Waiting: CSPs that remain open and that can be promoted to standby incase that their performance goes over certain limits or if moreresources are required. For instance, a waiting CSP may have highperformance sometimes and poor performance during other periods.

Demoted: once an active CSP degrades, it goes to the demoted state. Thisstate implies that the system cannot assign new flows or traffic to thatCSP. Moreover, there are different options for handling ongoing trafficover a demoted CSP. For example, an aggressive option would consist ofcutting the CSP and letting TCP handle the packet loss (for packets thatwere being transmitted or on the buffers). Another option wouldimplement a soft transition between CSPs without losing packets. Thisoption lets the ongoing packets to be sent and then the CSP is demotedcompletely.

Probing: Probing CSP is a tunnel under analysis (according to thenetwork metrics of interest) prior to being promoted as a valid CSP orbeing eliminated.

Deleted CSPs are not considered a state since everything related to themis erased. The crosses in FIG. 10 represent elimination of a CSP.

The conditions that determine the different state transitions depend onnetwork metrics including throughput, latency, and packet loss. In someembodiments of the present invention, these transitions are static andmanual. In alternative embodiments, the process is automated, takinginto account more metrics.

Agents manage a queue for each state that contains CSPs IDs. When anagent requires a new CSP to transmit, it goes to the active queue andselects the CSP ID with the desired metric. In addition, priorities canapply to these queues to optimize the CSP selection process.

Flow to CSP Assignment

Policies specify the maximum number of sub-flows from a given flow andthe number of CSPs it can use. On top of these parameters the flow toCSPs assignments are implemented to optimize performance.

In one embodiment, when a new flow arrives to the VM, it is divided intoas many sub-flows as the number of active CSPs. Then, one sub-flow issent through each active CSP. This process is static and it is done eachtime a new flow arrives at an agent.

In alternative embodiments, a dynamic flow/sub-flow division based ontraffic characterization, and network condition is utilized. A dynamicsolution has advantages because one flow division and CSP assignment mayhave a good performance at a certain moment but a poor performance inanother time. In some embodiments the number of sub-flows is variableand the CSP assignment can change dynamically. CSPs are dynamic as welland the dynamic assignment can depend upon application type, time ofday, type of users, etc.

Architecture

Managing large numbers of CSPs between different endpoints, adaptingflow to CSP assignments to network conditions, and guaranteeingend-to-end QoE is a significant task from scalability and complexityperspectives. This management includes acting upon real-time informationsuch as network status (e.g., congestion, failures, etc.), cross-trafficinterference, and dealing with the stochastic nature of the publicInternet among others. A hierarchical architecture has been defined totackle these problems. FIG. 11 illustrates this architecture. Oneembodiment uses a three layer hierarchical system where the top layer isthe Global Layer (151), the middle layers are the Mid Layers (161) andthe lowest layer is the Data Layer (171). At the bottom of thearchitecture there is the data level that manages information,measurements and packets. Above the data level is the mid-level, whichcan be decomposed into different sub-levels to manage multiple levels ofaggregations. Their scope encompasses Local Contextual Information (LCI)including enterprise requirements (e.g. departmental division, userpolicies, etc.), and network topologies (e.g. number of ISPs, ports,etc.). This level handles policies that includes rules for CSPs and flowto CSP assignments based on LCI. At the top of the architecture there isthe global level, dealing with world-scale events such associo-political situations, catastrophic disasters, sport games, etc.This Global Contextual Information (GCI) is used to generate policies toguide lower levels. Interfaces handle communications and exchange ofinformation between these levels. A key characteristic is thesemi-independent regime of operation. In case the data layer getsdisconnected from the upper layers, it can continue operating based onthe information it has available.

Embodiments of the present invention have distributed learningalgorithms in each layer to optimize their functionalities exploitingtheir different contextual information. The fact that each layer dealswith different type of information contributes to provide a scalablesolution for guaranteeing QoEs over the public Internet.

The higher the level, the higher the abstraction and the scope of thesystem but the coarser the granularity. These levels also reflectdifferent time scales. The Data level can operate in real time but midand global levels do not. These different regimes of operation resultfrom a distributed architecture with agents (data level) running atendpoints and mid and global levels running on-premises or in the Cloud.

Single Enterprise Perspective

When an enterprise wants to guarantee end-to-end QoEs using embodimentsof the present invention, deploying agents at their offices is the firstaction. The combination of these agents results in implementation of thedata layer. Each agent is responsible for managing CSPs and assign flowsto them based on a set of policies.

These policies are an output of the mid layer that generates them basedon LCI. Simultaneously, the mid layer receives policies from the globallayer that takes into account GCI. In this case with a singleenterprise, mid and global layers are dedicated to ensure their QoEs.FIG. 12 shows this situation. Agents employing the Data layer (221, 261,251) at enterprise endpoints are coupled through ISPs (231) to thepublic Internet (241). Mid layer (211) and Global layer (201) are alsocoupled to the public Internet (241) and communication with each otherand with the Data layer (221, 261, 251).

Global Perspective

Embodiments of the present invention for the data layer is replicatedfor each client together with part of the sub-layers inside themid-level. The global level is shared among all customers to exploit thejoint analytics and traffics. Despite aggregating information, eachenterprise domain is isolated to ensure security and privacy of theirdata and communications.

A global perspective of the architecture is presented in FIG. 13. Agentsemploying the Data layer at a first enterprise “Company A” (321, 331,371), and a second enterprise “Company B” (341, 361) are coupled throughISPs (351) to the public Internet (391). Mid layer for Company A (311)and for Company B (381) and Global layer (301) are also coupled to thepublic Internet (391) and communication with each other and with theData layers (321, 331, 371, 341, 361).

In this case, there are two enterprises. Each office has an agent toestablish and handle connections with other offices. These agentsreceive policies and commands from their respective mid layers, whichrun on the cloud. At the same time, mid layer software receives policiesfrom the global layer that takes into account the GCI. Global layersoftware may run on distributed Clouds around the world.

In this illustrative example only offices within the same enterprise canconnect between them. In alternative embodiments, more complex solutionsto interconnect offices from different enterprises are implemented. Thistopology could result in increased complexity of the policies. Differentsub-layers within the mid-level can be configured to handle thiscomplexity generating policies adapted to a specific scenario.

In an alternative embodiment, a higher sub-layer to handleinter-enterprise connections is used on top of specific mid-levelsub-layers for each enterprise. This configuration is illustrated inFIG. 14A and FIG. 14B. Agents employing the Data layer at Company A(421, 431, 471), and Company B (441, 461) are coupled through ISPs (451)to the public Internet (491). Mid layer for Company A (411) and forCompany B (481), in addition to an inter-enterprise mid-layer (483) aswell as Global layer (401) are also coupled to the public Internet (491)and communicate with each other and with the Data layers (421, 431, 371,441, 461). Conceptually the three mid-layers can be viewed as an A-Binter-company layer (403) on top of a Company A mid layer (413) andCompany B mid layer (423). In this situation the mid-level has twodifferent sub-layers.

One advantage of creating different sub-layers inside the mid-level isto exploit locality-awareness and joint characteristics betweenenterprises. A policy designed by combining LCI from differententerprises can result in improved performance. Both mid-levelsub-layers and global layer run on the Cloud. The layered architecturedeals with the complexity while being totally transparent to theend-user.

Data Layer

The data layer deals with packets and flows, accessing and acting uponthe user data. This layer has two major components, data plane andcontrol plane. The data plane is in charge of forwarding the packetsbased on a set of switching and routing policies. This plane handles theflows and their corresponding sub-flows. Also the data-plane is theplace for traffic shaping and priorities. In contrast, the control planefocuses on managing CSPs, flows to CSP assignments, sub-flow policiesenforcement, and learning algorithms.

A key task is the measurement and collection of CSP metrics. Embodimentsof the present invention collect a set of metrics as described above,which include bandwidth, packet loss and latency. The purpose of theseanalytics is to monitor the performance of the system in real-time, andmake decisions to maintain and improve the QoE.

An important functionality is the Extraction, Transformation, andLoading (ETL) of several network metrics and key performance indicators(KPI). In an embodiment of the invention, all of this data is curatedand then sent to the mid layers based on its granularity. Ideally onlystrictly necessary information goes from one layer to the other.

The frequency of these inter-layer communications can follow a periodicdistribution. For example, each second a set of data could be sent fromdata layers to mid-layers. The value of this interval becomes crucial toenable a responsive system and not jeopardize its scalability. A shorterinterval gets a finer granularity at a cost of a huge overhead both inprocessing and transmission efforts. A long interval avoids that costbut reduces the reaction time. Thus, there is trade-off between accuracyand resource usage to control the system's granularity. In thisembodiment, an interval for data layer to mid-layer communication isapproximately one second. In alternative embodiments, other intervalsare implemented.

In one embodiment, ZeroMQ, which is a high-performance asynchronousmessaging library, and MessagePack, which is a computer data interchangeformat, enable this inter-layer communication. In addition, embodimentsof the data plane support Message Queue Telemetry Transport (MQTT),which is an ISO standard publish-subscribe-based lightweight messagingprotocol for use on top of TCP/IP.

An exception (event-based) mechanism is utilized to improve the reactiontime. When the data layer detects an abrupt change in a CSP, it sends anexception to the mid layers outside of the interval-based communication.This exception will trigger different events in the mid layers that willdecide how to react to such variations. This technique improves thesystem scalability while enhances its reaction time to manage unexpectedchanges with a small overhead. This feature reinforces the transparencytowards end-users and applications.

Besides monitoring tunnels and CSPs, the data layer is also responsiblefor executing the tasks required to distribute flows across differentCSPs. This function includes encapsulation of packets, optionalencryption, and the execution of fork/join operations over the flows toenable Multipath Transport.

In addition, the data layer measures characteristics of physical orvirtual interfaces such as available bandwidth, packet loss and latency.In parallel, the data layer performs basic operations over tunnels andCSPs. For example, the data layer is responsible for keep-alive messagesthat maintain CSPs open, independently of the CSP state. The sameprinciple applies to tunnels. If no traffic is sent over them, eitherthe tunnel becomes a CSP or the tunnel is discarded.

The description above relates to the different functionalities of thedata layer and what it sends to the mid layer. In the oppositedirection, the mid layer sends policies for the data layer to enforce. Apolicy is the mapping of an information state into an action. Given thestate of the system, policies determine which CSPs can be used to sendwhat type of traffic and how traffic is divided into subflows along withother functionalities. A policy also triggers the transitions betweenthe different CSPs states (e.g., a CSP goes from active to standby), aswell as the promotion/demotion of tunnels to CSPs. Policies also controldifferent events such as way to capture data.

In an embodiment of the present invention, CSPs come in forward-returnpairs. While the forward and return paths need not be congruent, theyare not chosen independently. Hence, the choice of a CSP should be ajoint responsibility of the head point (source) and tail point(destination). The critical information is still local (mostly concernedabout access congestion at the ingress and egress links), but it islocal at the two endpoints (or more, if the communication includes otherdestinations). This justifies why the CSP choice be made at themid-layer. In both cases, the mid layer acts a broker to take decisionsaffecting different agents.

In addition to policies, the mid layer can also send a command topromote a specific tunnel to a CSP based on the analytics the data layerhas sent or to request a specific metric. Upon the reception ofcommands, the data layer executes the desired action. Another examplecommand is probing a tunnel. Thus, the data layer observes all the CSPsregardless of their state, but to act upon them requires the policiesfrom the mid layers. The control in this case is off-loaded from thedata layer due to scalability issues.

Policies can have different degrees of freedom that the data layer canexploit. For example, a policy may indicate that only CSPs with athroughput greater than 10 Mbps can be used. In this situation, the datalayer can decide which of the CSPs that fulfill this condition are used.In some embodiments, strict policies are utilized that act as controlalgorithms based on conditional statements. In another embodiment, moreflexibility is allowed by the mid layer policies. An intelligentalgorithm can exploit those degrees of freedom to optimize performance.For example, a machine learning algorithm may be used to exploit thereactive information (i.e., tests on each CSP) to decide over which CSPsthe data is sent, while still keeping within the mid layer policies.

In one embodiment of the present invention, all of the functions of thedata layer run on a Virtual Machine (VM) at agents located at eachendpoint. One VM can manage the set of physical ports in one endpoint.The data layer is important because it constitutes the foundations ofthe overall architecture. Multipath optimizations and learningalgorithms are built on top of the data layer.

In summary, in a preferred embodiment the data layer performs thefollowing functions: 1. Measure metrics over CSPs and tunnels on aninterval basis; 2. Curate the analytics; 3. Keep CSPs alive; 4. Executemid layer policies and commands; 5. Real-time control operations; and 6.Application decoding and classification (may be controlled by policies).

In a preferred embodiment, the data layer has the following inputs: 1.Mid layer policies; and 2. Commands for tunnel, CSP, and flowmanagement, and the following outputs: 1. Curated analytics (CSPs andtunnels) such as RTT, one-way delay, throughput, capacity, andtraceroute; and 2. Handle exceptions in case of extreme situations(e.g., sudden loss of certain capabilities such as available BW).

Some of the embodiments of the data layer are composed of the followingmodules: 1. Sub-flow manager; 2. Application classifier; 3. Networkcontroller; and 4. Report agent. In alternative embodiments, the datalayer also implements machine learning algorithms to exploit the degreesof freedom allowed by the mid-layer policies and discovers new networkmetrics and relations among each other.

Mid Layers

The mid layer is the layer that understands and controls the system fromeach company's perspective. In one embodiment of the present invention,this information includes details about the ISPs, physical interfaces,IP addresses, transport ports, topologies, etc. This informationencompasses what is referred to as local contextual information (LCI).In addition, LCI includes priorities between different applications,application and flow categories to create patterns guiding new policies,and information about cross-traffic among other things. Thus, the scopeof this layer creates the picture of the system inside each enterprise.

The mid layers also tell the data layer to probe different tunnelspassing a tunnel identifier. Once the data layer sends back the networkmeasurements, the mid layers can decide whether to promote or demotethem to CSPs through policies or commands. Then, these orders are sentback to the data layer that executes the desired policy or command.Other commands can include measuring available bandwidth, latency,transition between CSPs states, modify the frequency to perform ETLs inthe data layer, and define policies using local contextual awareness(enterprise-level).

In addition, the mid layer has as inputs application and userrequirements that guide policies in combination with curated analytics.Since the mid layer is a logical layer, it does not see flows orapplication data. In some embodiments, network configurations (transportports, IP addresses, interfaces, etc.) are entered in this layer througha GUI by IT personnel in each enterprise. In alternative embodiments,this information is the result of a self-discovery process.

As outputs to higher layers, the mid layer sends information to theglobal layer about performance bottlenecks due to unexpected situationsand the status of the connections (from all the mid-level sub-layers).Status tests and data analytics are curated again before going to thehigher level to reduce the granularity while gaining abstraction to havea better scalability. In the opposite direction, the mid layer receivespolicies from the global layer and commands. Since these rules come fromhigher layers, they are more abstract such as: do not use the networkpaths that cross over a certain country to avoid a political situation.The relationship between data layer, mid layer and global layer can beanalogized to a military organization. Soldiers (data layers) receiveorders form captains (mid layers) which at the same time follow ordersfrom generals (global layer). The higher the layer originating a policy,the fewer granularities it has but more priority to be guaranteed bylower levels.

Machine learning algorithms can exploit local contextual information(LCI) to dynamically adapt or create new policies that improve theperformance of the solution. The degrees of freedom left by the globallayer policies determine the improvement areas for the mid layer. Thisreveals that the learning architecture also follows a hierarchicalstructure according to the information available in each layer.

In a preferred embodiment, the mid layers run on the Cloud, not on theagents. Different sub-layers can form the mid layers, which are alsoorganized in a hierarchical way. The number of sublayers will dependupon the contextual information to handle, the amount of intelligence toimplement, and enterprise relations among others.

In summary, in a preferred embodiment the mid layer performs thefollowing functions: 1. Handle the view of the system (company-wise); 2.Manage the CSPs and tunnels; 3. Send policies and commands to the datalayer; 4. Execute policies and commands from the global data layer; and5. Collect and process measurements from the data planes and lower midlayers.

In a preferred embodiment, the mid layer has the following inputs: 1.Policies from the global layer; and 2. Curated data analytics from thedata layer, and the following outputs: 1. Curated data analytics to theglobal layer; 2. Policies and commands to the data layer; 3. LCI; and 4.GUI user data.

In alternative embodiments, the mid layer utilizes machine learningalgorithms to dynamically create the policies and classifies trafficpatterns.

Global Layer

The global layer has a total view of the system including allenterprises, the global public Internet and external factors such associo-political events, news, etc. Thus, the contextual informationinside this level is global contextual information (GCI) since it goesbeyond the focus of a single enterprise as for the mid layers.

The global layer guides its policies and decisions also according tocurated analytics received from the mid layers. These policies leaveroom for optimization in the lower layers with the constraints of itsrules and their degrees of freedom. An example of a global policy wouldbe: not to use certain country links, avoid congestions due to a sportsevent or use CSPs that go through an area that is having low utilizationbecause it is a holiday or night time.

In summary, in a preferred embodiment the global layer uses thefollowing inputs: 1. Curated analytics from the mid layers; 2. Externalinformation about socio-political events; and 3. Crawling forinformation, and has the following outputs 1. Policies and commands tothe mid layers.

In alternative embodiments, the global layer applies machine learning tocreate the policies dynamically and establishes relations between worldevents and network metrics.

Learning

The system architecture composed of different levels has been describedabove. Also noted above was the fact that in each level some embodimentsemploy learning algorithms that exploit the information available totake better decisions, optimizing the performance of the overall system.Below the learning in each level is described including its main tasks,inputs and outputs required, and their final objectives.

The learning solution for this embodiment is applied in three differentlayers: (i) data layer, (ii) mid layers, and (iii) global layer. Thesethree layers translate into three different stages of learning. Startingat the data layer, there is a reactive phase based on the informationabout CSPs and packets being transmitted. At the mid layers there is alocal phase which relies on LCI such as cross-traffic within anenterprise. Lastly, at the global layer there is a global phase thatdeals with traffic flows with higher abstraction (e.g., traffic out ofone country).

Higher layers have a better visibility at a cost of larger latencies,degrading their real time capabilities. FIG. 15 illustrates thesedifferent phases within the architectural framework. Global layer (501)with learning module (531) is coupled to mid layer (511) with learningmodules (541, 551) which is in turn coupled to data layer (521) withlearning module (561).

An advantage of decomposing the learning in these three levels is toprovide an Internet-scale solution through different time andabstraction regimes. Embodiments of the present invention benefits fromthe different infrastructure capabilities in each architectural layer,from agents at the bottom layer to the Cloud in the mid and globallayers. Thus, when the system faces real-time constraints workloads canbe executed at the data layer while larger sets of non-critical data areprocessed in the Cloud where the capacity is less of a problem.

The multi-level learning architecture of embodiments of the presentinvention is referred to as Hierarchical Learning (HL) and is furtherillustrated in FIG. 16. Global layer (601) is coupled to mid layer (611)which is in turn coupled to data layer (621).

Embodiments of the HL architecture use different Machine Learningtechniques in each of the levels. For example, one level can be aRecursive Neural Network (RNN) and the next one can implement DeepLearning (DL). Each level is connected to the others in that the outputof the lower level becomes one of the inputs for the higher level. Inaddition, each layer has its own set of data that complements the layerinterconnection.

For example, the data layer has reactive information as input plus theinputs from the mid layers (e.g., policies). In consequence, each levelis independent as it could potentially take decisions to ensure theproper functionality of the learning system. For example, in the casethat the data layer and its reactive learning algorithms getdisconnected from the Cloud, the local learning within the data layercan make decisions to assign the flows to the best CSPs according to theinformation that layer has available. In this case, the depth ofinformation is not the same as the Cloud, but the system will continueworking. Subsequently, when the layers are connected and properlyfunctioning, the abstraction from each level together with the differentsystem vision they have can be exploited.

The infrastructure capabilities become a critical parameter in HL. Sinceagents are hosted in heterogeneous devices with different capabilities,learning algorithms running in each device are heavily influenced bytheir executing platform. For example, in some embodiments the VM isdeployed in a high-end server, while other cases consider an optimizedagent running in resources constrained devices like a mobile phone.

To avoid problems when running on heterogeneous devices, theinfrastructure capabilities are taken as an input in each learninglevel. Capabilities then influence the ML-based decisions through thefeedback from one level to another. This design characteristic thenresults in the autonomous optimization of the Hierarchical Learningaccording to the underlying infrastructure. In case a node at the datalayer can undertake more computation, it communicates this fact to themid layer. The mid layer then leaves more degrees of freedom in itspolicies. In the opposite case, an agent can communicate its limitedresources so the policies are more constrained thus requiring lesscomputational resources.

ML techniques used in each HL level make decisions based on constraintsimposed by policies from higher layers. These policies are defineddynamically according to the information available in each level tomanage the system. For example, they specify the flow assignment to thedifferent CSPs available. Each policy leaves different degrees offreedom, which are exploited by the levels below to apply Machinelearning techniques without violating those policies.

Some embodiments allow Machine Learning parameters (such as weights inNeural Networks, or probability tables in Bayes Nets) to be exchangedbetween layers, i.e. knowledge is exchanged that was obtained byreplicating the Machine Learning engines already trained. This approachallows each layer to obtain by themselves policies and control commands,ensuring the autonomy of the layers even in the case that they getdisconnected from the Cloud. This approach also implies more rapiddecision making. In other words, the bottom layers gain the visibilityand knowledge of the upper layers and vice versa. Each layer endeavorsto update the Machine Learning parameters.

For example, the mid layer sends a policy to the data layer specifyingthat only CSPs with a throughput larger than 10 Mbps can be used.According to the reactive information, the data layer ML algorithms canoptimize the performance based on that policy and the degrees of freedomit has. In this case, the data layer can choose from all the active CSPsthat fulfill that condition. Another more restrictive situation would beto use only a specific active CSP. Here, the data layer cannot optimizehardly anything because the policy constrains the available behaviors.

These degrees of freedom are optimized dynamically at each hierarchicallevel taking into account system variables (capabilities of the agents,contextual information, network configuration, traffic interference,etc.). The HL architecture is similar conceptually to Recurrent NeuralNetworks (RNNs) that concatenate different iterations over a NN.Moreover, rather than having the NN always the same, different MLtechniques are concatenated with new inputs in each level whilemaintaining the feedback between them.

In embodiments of the present invention, machine learning is applied indifferent areas. The algorithms inside HL can apply to a wide range ofareas, such as: 1. Tunnel discovery (Path information with network nodestraversed based on IP address, Discovery time, etc.); 2. CSP promotionand demotion; 3. CSP state transition; 4. CSP classification accordingto application type, SLAs, QoE, etc.; 5. Flow and trafficclassification; 6. Flow assignments to CSPs (Flow division (how manysub-flows) based on contextual information, infrastructure capabilities,network conditions, etc.); 7. Anticipation of network conditionsaccording to history, events, etc.; and 8. Policy generation (in mid andglobal layers). These areas are conditioned by the degrees of freedom ineach architectural level imposed by policies from higher levels.

The main area in which ML techniques are applied is the flow assignmentto the available CSPs. To provide an efficient solution, this assignmentneeds to be dynamic and adapt to the network fluctuations. The amount ofdata to consider is large and it greatly varies over time, a fact thatposes significant challenges to the architecture. This is in contrast tothe CSP/Tunnel discovery that is more static.

To provide optimized flow to CSP assignment, different ML techniques areconsidered to execute in the HL architecture. In some embodiments at thedata layer a Recurrent Neural Network (RNN) is used that deals withreal-time reactive information. In the mid and global layers thebenefits of the Deep Learning technique are exploited to cover local andglobal contextual information respectively.

The present invention has been described above in connection withseveral preferred embodiments. This has been done for purposes ofillustration only, and variations of the inventions will be readilyapparent to those skilled in the art and also fall within the scope ofthe invention.

1. A method of communicating a flow of information over a plurality ofavailable connections in a system comprising a plurality of networkhosts coupled to a network capable of communicating information betweenhosts, wherein the system utilizes a plurality of connections betweenthe network hosts, the method comprising: receiving a request totransfer an application information flow between a first network hostand a second network host; breaking the application information flowinto a plurality of sub-flows; assigning each of the plurality ofsub-flows to a connection from among a plurality of connections betweensaid first network host and said second network host; communicating datafrom said plurality of sub-flows over one or more of said plurality ofconnections based on said step of assigning.
 2. The method of claim 1wherein said plurality of connections are pre-existing tunnelsmaintained independently from the communication of the applicationinformation flow.
 3. The method of claim 1 wherein said request isassociated with application characteristics comprising one or morecharacteristics taken from the set consisting of: packet size, flowsize, flow duration, latency requirements, and priority.
 4. The methodof claim 1 wherein said step of assigning is based on at least oneperformance metric associated with at least one of said plurality ofconnections.
 5. The method of claim 1 wherein said step of breakingchanges how the application information flow is divided into sub-flowsdynamically during transfer of the application information flow.
 6. Themethod of claim 1 wherein said step of assigning changes an assignmentof one or more of the plurality of sub-flows to a different connectionamong said plurality of connections during transfer of the applicationinformation flow.
 7. The method of claim 1 wherein at least one of saidplurality of sub-flows is bidirectional and each direction of said atleast one of said plurality of sub-flows is assigned to different onesof said plurality of connections.
 8. The method of claim 1 wherein saidfirst network host comprises a first network interface and said secondnetwork host comprises a second network interface, and wherein at leasttwo of said plurality of connections are established that utilize bothsaid first network interface and said second network interface.
 9. Themethod of claim 1 further comprising the steps of: receiving a secondrequest to transfer a second application information flow between saidfirst network host and said second network host; breaking the secondapplication information flow into a second plurality of sub-flows;assigning each of the second plurality of sub-flows to a connection fromamong said plurality of connections between said first network host andsaid second network host; communicating data from each of said secondplurality of sub-flows over one or more of said second plurality ofconnections based on said step of assigning each of the second pluralityof sub-flows.
 10. The method of claim 9 wherein at least one of saidplurality of connections communicates data from both the applicationinformation flow and the second application information flow.
 11. Anapparatus for communicating information flows between network hosts overa network coupling the network hosts, the apparatus comprising: a firstnetwork host comprising at least one processor in communication withleast one memory storing processor readable instructions, wherein the atleast one processor is operably configured by the processor readableinstructions to: receive a request to transfer an applicationinformation flow to a second network host; break the information flowinto a plurality of sub-flows; assign each of the plurality of sub-flowsto a connection from among a plurality of connections between said firstnetwork host and said second network host; communicate data from saidplurality of sub-flows over one or more of said plurality ofconnections.
 12. The apparatus of claim 11 wherein said plurality ofconnections are pre-existing tunnels maintained independently from thecommunication of the application information flow.
 13. The apparatus ofclaim 11 wherein said wherein said request is associated withapplication characteristics comprising one or more characteristics takenfrom the set consisting of: packet size, flow size, flow duration,latency requirements, and priority.
 14. The apparatus of claim 11wherein said processor is operably configured to assign each of theplurality of sub-flows to a connection based on at least one performancemetric associated with at least one of said plurality of connections.15. The apparatus of claim 11 wherein said processor is operablyconfigured to change how the application information flow is dividedinto said plurality of sub-flows dynamically during transfer of theapplication information flow.
 16. The apparatus of claim 11 wherein saidprocessor is operably configured to change an assignment of one or moreof the plurality of sub-flows to a different connection among saidplurality of connections during transfer of the application informationflow.
 17. The apparatus of claim 11 wherein at least one of saidplurality of sub-flows is bidirectional and each direction of said atleast one of said plurality of sub-flows is assigned to different onesof said plurality of connections.
 18. The apparatus of claim 11 whereinsaid first network host comprises a first network interface and saidsecond network host comprises a second network interface, and wherein atleast two of said plurality of connections are established that utilizeboth said first network interface and said second network interface. 19.The apparatus of claim 11 wherein the at least one processor is furtheroperably configured by the processor readable instructions to: receive asecond request to transfer a second application information flow betweensaid first network host and said second network host; break the secondapplication information flow into a second plurality of sub-flows;assign each of the second plurality of sub-flows to a connection fromamong said plurality of connections between said first network host andsaid second network host; communicate data from each of said secondplurality of sub-flows over one or more of said second plurality ofconnections.
 20. The apparatus of claim 19 wherein at least one of saidplurality of connections communicates data from both the applicationinformation flow and the second application information flow.